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1 Project Overview 

The main goal of this project was efficient distributed parallel and workstation 
cluster implementations of Newton-Krylov-Schwarz (NKS) solvers for implicit 
CFD. “Newton” refers to a quadratically convergent nonlinear iteration using 
gradient information based on the true residual, “Krylov” to an inner linear 
iteration that accesses the Jacobian matrix only through highly parallelizable 
sparse matrix-vector products, and “Schwarz” to a domain decomposition form 
of preconditioning the inner Krylov iterations with primarily neighbor-only ex- 
change of data between the processors. Prior experience has established that 
Newton-Krylov methods are competitive solvers in the CFD context and that 
Krylov-Schwarz methods port well to distributed memory computers. The com- 
bination of the techniques into Newton-Krylov-Schwarz was implemented on 2D 
and 3D unstructured Euler codes on the parallel testbeds that used to be at 
LaRC and on several other parallel computers operated by other agencies or 
made available by the vendors. Early implementations were made directly in 
MPI with parallel solvers we adapted from legacy NASA codes and enhanced 
for full NKS functionality. Later implementations were made in the framework 
of the PETSc library from Argonne National Laboratory, which now includes 
pseudo-transient continuation Newton-Krylov-Schwarz solver capability (as a 
result of demands we made upon PETSc during our early porting experiences). 

A secondary project pursued with funding from this contract was parallel 
implicit solvers in acoustics, specifically in the Helmholtz formulation. A 2D 
acoustic inverse problem has been solved in parallel within the PETSc frame- 
work. 
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2 Papers and Book Chapters Supported In Part 
by the Contract 

Details on the specifics of the research accomplished under partial sponsorship 
of this contract have been widely published, including in the following archival 
publications, listed in reverse chronological order. (The first four of these are 
available on the WWW in advance of publication.) 

1. Trends in Algorithms for Nonuniform Applications on Hierarchical Dis- 
tributed Architectures , D. E. Keyes, 1998, in “Computational Aerosciences 
in the 21st Century” (M. D. Salas, et al. eds.), Kluwer, Dordrecht (to ap- 
pear). 

2. How Scalable is Domain Decomposition in Practice? , D. E. Keyes, 1998 
(under review for Proceedings of the 11th International Symposium on 
Domain Decomposition Methods). 

3. Newton- Krylov- Schwarz Methods for Aerodynamics Problems: Compress- 
ible and Incompressible Flows on Unstructured Grids , D. K. Kaushik, D. 
E. Keyes and B. F. Smith, 1998 (under review for Proceedings of the 11th 
International Symposium on Domain Decomposition Methods). 

4. Globalized Newton- Krylov- Schwarz Algorithms and Software for Parallel 
Implicit CFD , W. D. Gropp, D. E. Keyes, L. C. Mclnnes and M. D. Tidriri, 
1998, ICASE Technical Report 98-24 (under review for International Jour- 
nal of Supercomputer Applications). 

5. Parallel Newton- Krylov- Schwarz Algorithms for the Transonic Full Poten- 
tial Equation , X.-C. Cai, W. D. Gropp, D. E. Keyes, R. G. Melvin & D. 
P. Young, 1998, SIAM J. Sci. Comp. 19:246-265. 

6. Convergence Analysis of Pseudo- Transient Continuation , C. T. Kelley h 
D. E. Keyes, 1998, SIAM J. Num. Anal. 35:508-523. 

7. Additive Schwarz Methods for Hyperbolic Equations , Y. Wu, X.-C. Cai 
& D. E. Keyes, 1998, in “Proceedings of the 10th International Confer- 
ence on Domain Decomposition Methods” (J. Mandel et al., eds.), AMS, 
Providence, pp. 513-521. 

8. On the Interaction of Architecture and Algorithm in the Domain-Based 
Parallelization of an Unstructured Grid Incompressible Flow Code ) D. K. 
Kaushik, D. E. Keyes & B. F. Smith, 1998, in “Proceedings of the 10th 
International Conference on Domain Decomposition Methods” (J. Mandel 
et al., eds.), AMS, Providence, pp. 311-319. 

9. Parallel Implicit PDE Computations: Algorithms and Software , W. D. 
Gropp, D. E. Keyes, L. C, Mclnnes & M. D. Tidriri, 1997, in “Proceedings 
of Parallel CFD ’97” (A. Ecer et al., eds.), North Holland, Amsterdam, 
pp. 333-344. 


2 



iidijg 


10. Prospects for CFD on Petaflops Systems , D. E. Keyes, D. K. Kaushik & 
B. F. Smith, 1997, in “CFD Review” (M. Hafez et. al., eds.), Wiley, New 
York (to appear). 

11. Newton-Krylov-Schwarz: An Implicit Solver for CFD , X.-C. Cai, D. E. 
Keyes & V. Venkatakrishnan, 1997, in “Proceedings of the Eighth Inter- 
national Conference on Domain Decomposition Methods” (R. Glowinski 
et al., eds.), Wiley, Chichester, pp. 387-402. 

12. Newton- Krylov- Schwarz Methods: Interfacing Sparse Linear Solvers with 
Nonlinear Applications , D. E. Keyes & V. Venkatakrishnan, 1996, Zeitschr. 
Angew. Math. 76(Suppl. 1) : 147—150. 

13. Application of Newton- Krylov Methodology to a Three-Dimensional Un- 
structured Euler Code, E. J. Nielsen, R. W. Walters, W. K. Anderson &; 
D. E. Keyes, 1995, in “Proceedings of the 12th AIAA Computational Fluid 
Dynamics Conference” (San Diego, June 1995), AIAA Paper 95-1733. 

3 Presentations Crediting NASA Support 

Results from the project have been featured, with credits to NASA, in presen- 
tations at the following conferences and workshops, as well as in NASA Langley 
presentations and many departmental seminars: 

1. Eleventh International Conference on Domain Decomposition Methods, 
Greenwich, England, July 1998. 

2. Workshop in Honor of Professor V. S. Ryaben’kii, ICOSAHOM’98, Tel 
Aviv, Israel, June 1998. 

3. Petaflops Systems Operations Review (POWR) Workshop, Bodega Bay, 
CA, June 1998. 

4. Computational Aerosciences in the 21st Century, NASA-Langley, Hamp- 
ton, VA, April 1998. 

5. Workshop on Adaptive Methods for Differential Equations, Royal Institute 
of Technology (KTH), Stockholm, Sweden, March 1998. 

6. Numerical Analysis Conference in Honor of Olof B. Widlund on the Oc- 
casion of his 60th Birthday, Courant Institute, NY, January 1998. 

7. Tenth International Conference on Domain Decomposition Methods, Boul- 
der, CO, August 1997. 

8. SIAM Annual Meeting, Stanford, CA, July 1997. 

9. Workshop on the Future of High Performance Computing, Bergen, Nor- 
way, May 1997. 
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10. Workshop on Algorithms for Petaflops Computing, Williamsburg, VA, 
April 1997. 

11. Eighth SIAM Conference on Parallel Processing for Scientific Computing, 
Minneapolis, MN, March 1997. 

12. Workshop on Computational Heat Transfer, Las Vegas, NV, March 1997. 

13. Workshop on Structured Adaptive Mesh Refinement, Institute for Math- 
ematics and its Applications (IMA), Minneapolis, MN, March 1997. 

14. Ninth International Conference on Domain Decomposition Methods in 
Science and Engineering, Bergen, Norway, June 1996. 

15. Workshop on Iterative Methods, International Linear Algebra Year, CER- 
FACS, Toulouse, France, June 1996. 

16. Conference on Iterative Methods, Copper Mountain, CO, April 1996. 

17. SIAM Annual Meeting, Charlotte, NC, October 1995. 

18. Iterative Methods for Large Scale Nonlinear Problems, Utah State Uni- 
versity, Logan, UT, September 1995. 

19. ICIAM ’95, Hamburg, Germany, July 1995. 

20. Eighth International Conference on Domain Decomposition Methods in 
Science and Engineering, Beijing, China, May 1995. 

4 Personnel Supported by the Contract 

1. Satish Balay, M.S. in Computer Science, 1995, Old Dominion University 

2. Kumar Kareti, M.S. in Computer Science, 1995, Old Dominion University 

3. Dinesh Kaushik, currently Ph.D. candidate 

4. Jay Morris, currently Ph.D. candidate 

5. Yunhai Wu, post- doctored fellow in Computer Science, Old Dominion Uni- 
versity 

5 Project Highlight 

The project has demonstrated the feasibility of scaling important implicit exter- 
nal aerodynamics problems to the thousand-processor regime, permitting large 
grids (in the millions of vertices) to be employed in structured and unstructured 
discretizations, in incompressible and compressible regimes. 

A recently achieved “high water mark” stemming from the algorithmic and 
software efforts undertaken under this contract is presented in histogram form 
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128 256 384 512 640 768 896 1024 128 256 384 512 640 768 896 1024 


The fixed-problem-size scaling study shows reductions in wallclock execution 
time that closely follow the number of vertices per processor, as processors are 
varied from 128 in number to 1024. The lower end of the range is limited by per- 
node memory capacity; the upper end is not intrinsically limited, but marginal 
efficiency is beginning to be defeated by subdomain surface-to-volume ratio. 

The middle row of figures shows that the computational rate per processor 
remains nearly level as problem size per processor varies over a range of 8. 
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Therefore the aggregate computational grows rate nearly linearly in processor 
number, to nearly 80 Gflop/s. 

The bottom row shows that there is only a small degradation in convergence 
rate as the number of processors is increased and the Schwarz preconditioner 
is divided into finer diagonal blocks. The final figure shows implementation 
efficiency (speedup per processor on a per-iteration basis) in excess of 80% over 
a range of 8-fold in processor number. 


6 Leveraged Activities 

The research undertaken pursuant to this contract has been aided by the follow- 
ing other grants and contracts awarded to the same principal investigator at Old 
Dominion University. The process of obtaining some of these other grants was 
significantly enhanced by the credibility of the original contract from NASA, 
which predated all of them. Hence, the funds from NASA were effectively em- 
ployed as seed funds to accomplish work of importance to NASA, as well as to 
three other federal agencies. 

Parallel PDE Algorithms for the Accelerated Strategic Computing Initia- 
tive. LLNL (DOE), 10/97 - 9/98. 

Development and Application of the Portable Extensible Toolkit for Scien- 
tific Computing. ANL (DOE), 9/97 - 10/97. 

A Numerical Laboratory for Multi-Model Multi-Domain Computational 
Methods in Aerodynamics and Acoustics . NSF, 10/95 - 9/98. 

Graduate Fellowships in Computer Science (GAANN). U. S. Dept, of 
Education, 9/95 - 8/98. 

Solution- Adaptive Grid Partitioning and Variable Ordering for PDEs. NSF, 
7/95 - 6/98. 
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